Skip to content

feat(helm): add pod extensibility controls#380

Open
thekkagent wants to merge 7 commits intoopenabdev:mainfrom
thekkagent:feat/helm-chart-extensibility
Open

feat(helm): add pod extensibility controls#380
thekkagent wants to merge 7 commits intoopenabdev:mainfrom
thekkagent:feat/helm-chart-extensibility

Conversation

@thekkagent
Copy link
Copy Markdown

@thekkagent thekkagent commented Apr 16, 2026

What problem does this solve?

OpenAB's Helm chart is currently strong on the minimal install path, but it is too thin for many real Kubernetes deployments.

It does not currently expose several common deployment controls that operators typically expect from a reusable open-source chart, including:

  • imagePullSecrets
  • probes (livenessProbe, readinessProbe, startupProbe)
  • lifecycle
  • initContainers
  • extraContainers
  • extra volumes / volume mounts
  • serviceAccountName binding
  • chart-managed ServiceAccount with IRSA support
  • Role / ClusterRole RBAC
  • PodDisruptionBudget
  • other pod-level and cluster-level deployment settings that are common in production clusters

This leaves users with a few unattractive options:

  1. rebuild a custom image for relatively minor deployment-specific needs
  2. patch rendered manifests after helm template
  3. fork the chart just to add standard Kubernetes fields

A concrete example: operators who need agents to access AWS resources (via IRSA) or the Kubernetes API currently have to fork the chart to add ServiceAccount annotations and RBAC rules.

Closes: #397

Discord Discussion URL: https://discordapp.com/channels/1491295327620169908/1493841502529523732

At a Glance

┌─────────────────────────────────────┐
│ Current OpenAB Helm Chart           │
│ minimal install path                │
└──────────────┬──────────────────────┘
               │
               ▼
┌─────────────────────────────────────┐
│ Missing common chart hooks          │
│ - imagePullSecrets                  │
│ - probes / lifecycle                │
│ - initContainers / extraContainers  │
│ - extra volumes / mounts            │
│ - ServiceAccount + RBAC             │
│ - PodDisruptionBudget               │
│ - polymorphic env (valueFrom)       │
└──────────────┬──────────────────────┘
               │
               ▼
┌─────────────────────────────────────┐
│ This PR                             │
│ pod extensibility + SA/RBAC/PDB     │
│ + env + template fixes              │
└─────────────────────────────────────┘

Prior Art & Industry Research

OpenClaw:

I reviewed the local OpenClaw repository and its Kubernetes deployment manifests. While OpenClaw does not currently ship a Helm chart in this repo, it does treat startup bootstrap as a first-class deployment concern. In particular, it uses an initContainer to prepare configuration and workspace state before the main container starts.

This suggests:

  • initContainers are a reasonable place for startup preparation
  • bootstrap logic should be treated as deployment design, not an ad hoc workaround
  • a hardened main container can stay simpler when initialization is separated

Reference:

  • scripts/k8s/manifests/deployment.yaml

Hermes Agent:

I reviewed Hermes Agent's Docker and deployment documentation. Hermes takes a clearer stance on tool installation: stable toolchains should primarily be handled through custom images or clearly defined mutable runtime environments, not by stretching the chart into a package manager.

This suggests:

  • serious toolchains should prefer custom images
  • runtime bootstrap can exist, but it should stay lightweight and bounded
  • Helm should expose deployment extension points, not replace image design

References:

  • Dockerfile
  • website/docs/user-guide/docker.md
  • website/docs/getting-started/nix-setup.md

Other references:

I reviewed Bitnami charts and Helm / Kubernetes best practices. Bitnami's more mature charts commonly expose imagePullSecrets, serviceAccount, probes, lifecycle, initContainers, extraContainers, extraVolumes, extraVolumeMounts, extraEnvVarsCM, extraEnvVarsSecret, extraDeploy, and pdb. This PR adopts the subset of that surface area that fits under Deployment.spec.template plus the essential cluster-level resources (ServiceAccount, Role, ClusterRole, PDB).

Relevant upstream guidance also supports this direction:

  • Helm chart values and template best practices
  • Kubernetes guidance for initContainers
  • Kubernetes guidance for private registry pulls via imagePullSecrets
  • Kubernetes RBAC best practices for namespaced vs cluster-scoped access

Proposed Solution

This PR implements full pod / deployment extensibility plus chart-managed ServiceAccount, RBAC, and PodDisruptionBudget. Discord feedback from @tonylee led to expanding beyond the original Phase 1 scope to cover AWS IRSA + K8s API access scenarios.

File Purpose
charts/openab/values.yaml Helm values for pod extensibility, serviceAccount, rbac, and podDisruptionBudget
charts/openab/templates/_helpers.tpl Helpers for per-agent image pull policy/secrets, ServiceAccount name resolution (create + external fallback), reserved-key-safe pod metadata merging, and unified discord.enabled gate
charts/openab/templates/deployment.yaml Render pod-level extensibility controls into Deployment.spec.template
charts/openab/templates/configmap.yaml Filter map-typed env from config.toml, escape string env via toJson, use unified discord gate
charts/openab/templates/secret.yaml Use unified discord gate helper
charts/openab/templates/serviceaccount.yaml Per-agent ServiceAccount (opt-in), with annotations + automount control
charts/openab/templates/role.yaml Per-agent namespaced Role (opt-in)
charts/openab/templates/rolebinding.yaml Per-agent RoleBinding (opt-in)
charts/openab/templates/clusterrole.yaml Per-agent ClusterRole (opt-in)
charts/openab/templates/clusterrolebinding.yaml Per-agent ClusterRoleBinding (opt-in)
charts/openab/templates/pdb.yaml Per-agent PodDisruptionBudget (opt-in)
charts/openab/tests/deployment_test.yaml Pod extensibility coverage (17 tests)
charts/openab/tests/env_test.yaml Env polymorphic rendering + configmap filtering (5 tests)
charts/openab/tests/serviceaccount_test.yaml ServiceAccount creation coverage (8 tests)
charts/openab/tests/rbac_test.yaml RBAC coverage (9 tests)
charts/openab/tests/pdb_test.yaml PodDisruptionBudget coverage (6 tests)

Implemented scope:

Pod / Deployment extensibility

  • imagePullSecrets (global + per-agent, with explicit opt-out, supports both K8s native and shorthand formats)
  • per-agent imagePullPolicy (global default + per-agent override)
  • per-agent serviceAccountName binding (for referencing externally-created SAs)
  • livenessProbe, readinessProbe, startupProbe, lifecycle
  • initContainers, extraContainers
  • extraVolumes, extraVolumeMounts
  • podAnnotations, podLabels (global + per-agent, merged with reserved-key protection)
  • polymorphic env rendering (string values + valueFrom maps via kindIs "map")
  • TOML env value escaping via toJson (fixes pre-existing injection risk)

Chart-managed resources (Phase 2)

  • ServiceAccount (opt-in via serviceAccount.create, with annotations for IRSA and automountServiceAccountToken control)
  • namespaced Role + RoleBinding (opt-in via rbac.create)
  • cluster-scoped ClusterRole + ClusterRoleBinding (opt-in via rbac.createClusterRole)
  • PodDisruptionBudget (opt-in via podDisruptionBudget.enabled, supports both minAvailable and maxUnavailable)

Template consistency

  • unified discord.enabled gate via openab.discordEnabled helper (consistent behavior across configmap, deployment, secret)

Explicitly out of scope:

  • extra ConfigMap / Secret creation — already covered by existing extraVolumes/extraVolumeMounts/envFrom, operators create their own K8s resources and reference them
  • generic extra objects such as extraDeploy — deferred

For the "install tools" question specifically, the implementation keeps two clear paths:

  1. Custom image — the preferred path for stable, repeatable, production-grade toolchains
  2. initContainers + shared volume — a lightweight bootstrap path for small binaries or startup initialization

Safety & correctness

Three design decisions are load-bearing for correctness.

Reserved pod-metadata keys cannot be hijacked. agentPodLabels and agentPodAnnotations use mergeOverwrite(dict, global, per-agent, reserved), where the reserved key set is merged last. This guarantees:

  • checksum/config annotation is always the chart-computed hash — users cannot accidentally pin it or disable rollout-on-config-change
  • app.kubernetes.io/{name,instance,component} labels on the pod template always match spec.selector.matchLabels, so DeploymentPod selector matching can never be broken by a user-supplied podLabels entry

Without this protection, a user who set podLabels.app.kubernetes.io/name: custom would silently produce duplicate YAML keys and a Deployment whose ReplicaSet could no longer match its own Pods.

Per-agent imagePullSecrets: [] explicitly opts out of the global list. The helper uses hasKey instead of a truthy check, so three states are distinguishable per agent:

  • key omitted → inherit .Values.imagePullSecrets
  • imagePullSecrets: [] → opt out (no pull secrets, even if global is set)
  • imagePullSecrets: [foo] → replace with per-agent list

All new resources are opt-in with safe defaults. serviceAccount.create, rbac.create, rbac.createClusterRole, and podDisruptionBudget.enabled all default to false. Existing deployments see no behavioral change on upgrade — only new resources are created when operators explicitly opt in.

Backward compatibility for ServiceAccount binding. The existing agents.<x>.serviceAccountName field still works for referencing externally-created SAs. When serviceAccount.create: true is set, the chart-created SA takes precedence over the external reference.

Why this approach?

I do not think OpenAB should model every Kubernetes field at once, and I do not think Helm should become the primary mechanism for packaging arbitrary tools.

At the same time, the current chart is thin enough that users are pushed toward forking it for fairly normal deployment requirements. That creates unnecessary friction for a public chart.

This approach takes the middle path:

  • keep the default chart simple — all new controls are opt-in
  • expose the extension points that operators commonly expect
  • keep custom images as the primary answer for serious tool installation
  • treat initContainers bootstrap as a lightweight complement, not the main packaging model
  • make chart-managed RBAC available for teams who need agents to access cloud resources or the K8s API, without requiring a fork

Tradeoffs and limitations:

  • the chart surface will grow
  • pod-metadata merge rules must be unambiguous and predictable — addressed by the reserved-last mergeOverwrite pattern
  • PodDisruptionBudget combined with the hardcoded replicas: 1 has a known edge case (minAvailable: 1 prevents node drains) — documented with a comment in values.yaml

Alternatives Considered

1. Pod extensibility only; defer SA/RBAC/PDB — initially chosen, then expanded

The first iteration of this PR scoped to Deployment.spec.template only, deferring SA/RBAC/PDB. After Discord feedback (specifically operators needing IRSA + K8s API access), the scope was expanded to include chart-managed resources since those are the exact gaps that otherwise force a fork.

2. Implement everything at once — chosen

Including SA/RBAC/PDB in the same PR keeps related extensibility changes together. All new resources follow the existing per-agent multi-resource loop pattern and share the same helpers (e.g. agentFullname, selectorLabels). All defaults are opt-in, so the risk profile is no higher than the Phase 1 changes.

3. Extra ConfigMap / Secret creation — rejected

The existing extraVolumes/extraVolumeMounts/envFrom fields already cover this use case. Users create their own ConfigMaps/Secrets (with their preferred lifecycle management) and reference them. Adding chart-managed extras would duplicate functionality without clear benefit.

Validation

  • helm lint charts/openab passes
  • Rendering coverage added for:
    • imagePullSecrets (global + per-agent, both K8s native and shorthand formats)
    • per-agent imagePullPolicy
    • initContainers, extraContainers
    • extraVolumes / extraVolumeMounts
    • probes (livenessProbe, readinessProbe, startupProbe), lifecycle
    • serviceAccountName + chart-managed ServiceAccount
    • ServiceAccount automountServiceAccountToken + IRSA annotations
    • Role + RoleBinding + ClusterRole + ClusterRoleBinding
    • PodDisruptionBudget (minAvailable, maxUnavailable, percentage values)
    • pod annotations and labels (global + per-agent merged)
    • polymorphic env rendering (string, valueFrom, mixed)
  • Hardening coverage added for:
    • reserved selector labels (app.kubernetes.io/{name,component}) cannot be hijacked by user podLabels
    • checksum/config annotation cannot be overridden by user podAnnotations
    • per-agent imagePullSecrets: [] opts out of the global list
    • per-agent podLabels override global for the same non-reserved key
    • list-typed env values rejected with clear error message
    • map-typed env values filtered from config.toml (TOML cannot represent valueFrom)
    • PDB fails loudly when both minAvailable and maxUnavailable are set
    • PDB fails loudly when neither is set
  • Backward compatibility:
    • existing serviceAccountName still resolves to external SA
    • serviceAccount.create: true overrides external reference when both are set
  • Manual deploy-oriented verification on a real cluster

Validation command:

helm unittest charts/openab

Result: 61 passed, 0 failed (across 8 suites)

DC: https://discord.com/channels/1491295327620169908/1493841502529523732/1494148867380219956

@thekkagent thekkagent requested a review from thepagent as a code owner April 16, 2026 01:28
@github-actions github-actions bot added the closing-soon PR missing Discord Discussion URL — will auto-close in 3 days label Apr 16, 2026
@github-actions github-actions bot removed the closing-soon PR missing Discord Discussion URL — will auto-close in 3 days label Apr 16, 2026
@thekkagent
Copy link
Copy Markdown
Author

Follow-up: polymorphic env rendering

This came up in Discord feedback — the current env map only supports simple key: "string" pairs, which makes it impossible to inject downward API fields or reference external secrets. For example, there's no way to pass metadata.name as an env var to the agent container.

Added kindIs map type detection so both formats work without breaking existing usage:

env:
  # existing usage — unchanged
  SIMPLE_VAR: "hello"
  # new — valueFrom and other complex types
  MY_POD_NAME:
    valueFrom:
      fieldRef:
        fieldPath: metadata.name

Changes:

  • deployment.yaml: env range uses kindIs "map" to branch between value: and toYaml
  • values.yaml: updated comment examples
  • helm-template-test.sh: added Test 19-21 (string, valueFrom, mixed)

Copy link
Copy Markdown
Contributor

@masami-agent masami-agent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review — PR #380

Well-structured PR, @thekkagent. The phased approach (pod/deployment extensibility only, deferring PDB/RBAC) is the right call. The reserved-key protection and imagePullSecrets opt-out design are particularly thoughtful.

✅ What looks good

  1. Reserved label/annotation protectionmergeOverwrite with reserved keys merged last prevents selector label hijacking and checksum/config override. This is a real safety concern that many charts get wrong.

  2. imagePullSecrets three-state design — key omitted (inherit global), [] (opt out), [foo] (override). Using hasKey instead of truthy check is correct.

  3. Polymorphic env renderingkindIs "map" check allows both MY_VAR: "hello" and MY_VAR: { valueFrom: ... }. Backward compatible with existing string-only env.

  4. Test coverage — 14 new tests (Tests 8-21) covering all new features plus hijack protection. The test harness fix (PASS=$((PASS + 1)) instead of ((PASS++))) for set -e compatibility is a nice catch.

  5. No src/ changes — pure Helm chart, lowest risk to runtime.

  6. Scope discipline — explicitly lists what is out of scope (PDB, RBAC, extraDeploy).

🔴 Must fix before merge

1. sidecars should be extraContainers or sidecarContainers

sidecars: []

The Bitnami convention uses sidecars but Kubernetes 1.28+ has native sidecar support via initContainers with restartPolicy: Always. Using sidecars as a field name that renders into containers: could confuse users who expect native sidecar behavior. Consider renaming to extraContainers to be explicit about where they render (in the containers array, not as native sidecars).

2. Missing Closes # in PR description

The PR description says Closes: N/A. If there is a related issue, link it. If not, consider opening one for traceability.

🟡 Non-blocking

3. values.yaml default agent block has new fields at the bottom

The new per-agent fields (serviceAccountName, podAnnotations, podLabels, probes, lifecycle, initContainers, sidecars, extraVolumes, extraVolumeMounts) are added after resources: {} but before nodeSelector. This is fine, but grouping them with a comment header (e.g. # Pod extensibility) would improve readability.

4. Empty object defaults vs omitted

livenessProbe: {}
readinessProbe: {}
startupProbe: {}
lifecycle: {}

These default to {} in values.yaml. The templates use {{- with $cfg.livenessProbe }} which treats {} as falsy in Helm, so they won't render. This is correct behavior, but a brief comment in values.yaml noting that {} means "disabled" would help operators.

5. Test file is getting large (232 new lines)

The test script is now ~320 lines. Not blocking, but consider splitting into separate test files by feature area in a follow-up (e.g. test-extensibility.sh, test-bot-messages.sh).

📝 Summary

Priority Item Status
🔴 Rename sidecars to extraContainers Needs fix
🔴 Missing issue link Needs fix
🟡 Comment header for new values fields Non-blocking
🟡 Document {} means disabled for probes Non-blocking
🟡 Test file size Non-blocking

Solid work overall. The reserved-key protection pattern is something other charts should learn from. Two small fixes and this is ready to approve.

@masami-agent
Copy link
Copy Markdown
Contributor

@thekkagent Before we can proceed with approval, three requests:

1. Rename sidecarsextraContainers
K8s 1.28+ introduced native sidecars (initContainers with restartPolicy: Always). Your field renders into the main containers array, not as native sidecars. extraContainers makes this explicit and avoids confusion.

2. Post a rendered manifest
Please run helm template with these values enabled and paste the output:

  • imagePullSecrets
  • initContainers (at least one)
  • livenessProbe
  • extraVolumes / extraVolumeMounts
  • podLabels with a custom key

This lets us verify indentation and empty-value behavior in the final YAML without needing a local Helm install.

3. Confirm reserved key protection works
Run this and paste the result:

helm template test charts/openab/ \
  --set-json 'agents.kiro.podLabels={"app.kubernetes.io/name":"hacked"}' \
  | grep 'app.kubernetes.io/name'

Expected: output should show the chart-computed name (e.g. openab), not hacked. This confirms that user-supplied podLabels cannot break Deployment → Pod selector matching.

@thekkagent
Copy link
Copy Markdown
Author

Addressing review feedback + follow-up improvements

Rebased on latest main (v0.7.7-beta.1) and pushed 4 clean commits:

1. feat(helm): add pod extensibility controls

Original PR scope — pod/deployment extensibility. Also fixes Test 1-7 to pass botToken after upstream discord.enabled gate change (#394).

2. feat(helm): support valueFrom and other complex env var types

From Discord feedback — the env map only supported simple key: "string" pairs, making it impossible to inject downward API fields or reference external secrets.

Added kindIs "map" type detection so both formats work:

env:
  SIMPLE_VAR: "hello"
  MY_POD_NAME:
    valueFrom:
      fieldRef:
        fieldPath: metadata.name
  • deployment.yaml: map values render via toYaml (e.g. valueFrom blocks)
  • configmap.yaml: map values are filtered out — config.toml only supports string env vars, valueFrom is handled at the pod level by Kubernetes
  • Tests 19-21 added (string, valueFrom, mixed)

3. refactor(helm): rename sidecars to extraContainers, add value comments

  • Renamed sidecarsextraContainers per review — this field renders into the main containers array, not as K8s 1.28+ native sidecars
  • Added # -- Pod extensibility comment header
  • Documented {} = disabled for probes and lifecycle

4. fix(helm): escape TOML env values, reject list env vars, support K8s imagePullSecrets format

  • Fixed pre-existing TOML injection risk where double quotes in env string values produced invalid TOML — now uses toJson for proper escaping
  • Added kindIs "slice" guard in both deployment.yaml and configmap.yaml to reject list-typed env values with a clear error message
  • Derived agentPodLabels reserved keys from selectorLabels helper to prevent future drift between Deployment selector and pod labels
  • imagePullSecrets now supports both K8s native object format (- name: regcred) and shorthand string format (- regcred)

Re: test file size

Agreed — will split test files by feature area in a follow-up PR.

Rendered manifest verification

helm template with imagePullSecrets, initContainers, livenessProbe, extraVolumes, podLabels, and polymorphic env
---
# Source: openab/templates/secret.yaml
apiVersion: v1
kind: Secret
metadata:
  name: test-openab-kiro
  labels:
    helm.sh/chart: openab-0.7.7-beta.1
    app.kubernetes.io/name: openab
    app.kubernetes.io/instance: test
    app.kubernetes.io/component: kiro
    app.kubernetes.io/version: "0.7.7-beta.1"
    app.kubernetes.io/managed-by: Helm
  annotations:
    "helm.sh/resource-policy": keep
type: Opaque
data:
  discord-bot-token: "dGVzdA=="
---
# Source: openab/templates/configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: test-openab-kiro
  labels:
    helm.sh/chart: openab-0.7.7-beta.1
    app.kubernetes.io/name: openab
    app.kubernetes.io/instance: test
    app.kubernetes.io/component: kiro
    app.kubernetes.io/version: "0.7.7-beta.1"
    app.kubernetes.io/managed-by: Helm
data:
  config.toml: |
    [discord]
    bot_token = "${DISCORD_BOT_TOKEN}"
    allowed_channels = ["YOUR_CHANNEL_ID"]
    allowed_users = []
    allow_bot_messages = "off"

    [agent]
    command = "kiro-cli"
    args = ["acp","--trust-all-tools"]
    working_dir = "/home/agent"
    env = { SIMPLE_VAR = "hello" }

    [pool]
    max_sessions = 10
    session_ttl_hours = 24

    [reactions]
    enabled = true
    remove_after_reply = false
---
# Source: openab/templates/pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: test-openab-kiro
  labels:
    helm.sh/chart: openab-0.7.7-beta.1
    app.kubernetes.io/name: openab
    app.kubernetes.io/instance: test
    app.kubernetes.io/component: kiro
    app.kubernetes.io/version: "0.7.7-beta.1"
    app.kubernetes.io/managed-by: Helm
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi
---
# Source: openab/templates/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: test-openab-kiro
  labels:
    helm.sh/chart: openab-0.7.7-beta.1
    app.kubernetes.io/name: openab
    app.kubernetes.io/instance: test
    app.kubernetes.io/component: kiro
    app.kubernetes.io/version: "0.7.7-beta.1"
    app.kubernetes.io/managed-by: Helm
spec:
  replicas: 1
  strategy:
    type: Recreate
  selector:
    matchLabels:
      app.kubernetes.io/name: openab
      app.kubernetes.io/instance: test
      app.kubernetes.io/component: kiro
  template:
    metadata:
      annotations:
        checksum/config: 3a5a7cd5cc371b23056d7db4bf43ee5bcee71347f0d1297d8b866a9023199911
      labels:
        app.kubernetes.io/component: kiro
        app.kubernetes.io/instance: test
        app.kubernetes.io/name: openab
        team: platform
    spec:
      imagePullSecrets:
        - name: "ghcr-secret"
      securityContext:
        fsGroup: 1000
        runAsGroup: 1000
        runAsNonRoot: true
        runAsUser: 1000
      initContainers:
        - command:
          - sh
          - -c
          - echo init
          image: busybox:1.36
          name: bootstrap
      containers:
        - name: openab
          image: "ghcr.io/openabdev/openab:0.7.7-beta.1"
          imagePullPolicy: IfNotPresent
          securityContext:
            allowPrivilegeEscalation: false
            capabilities:
              drop:
              - ALL
          env:
            - name: DISCORD_BOT_TOKEN
              valueFrom:
                secretKeyRef:
                  name: test-openab-kiro
                  key: discord-bot-token
            - name: HOME
              value: /home/agent
            - name: MY_POD_NAME
              valueFrom:
                fieldRef:
                  fieldPath: metadata.name
            - name: SIMPLE_VAR
              value: "hello"
          livenessProbe:
            httpGet:
              path: /healthz
              port: 8080
          volumeMounts:
            - name: config
              mountPath: /etc/openab
              readOnly: true
            - name: data
              mountPath: /home/agent
            - mountPath: /scratch
              name: scratch
      volumes:
        - name: config
          configMap:
            name: test-openab-kiro
        - name: data
          persistentVolumeClaim:
            claimName: test-openab-kiro
        - emptyDir: {}
          name: scratch

Reserved key protection verification

$ helm template test charts/openab/ \
  --set-json 'agents.kiro.podLabels={"app.kubernetes.io/name":"hacked"}' \
  | grep 'app.kubernetes.io/name'

    app.kubernetes.io/name: openab
    app.kubernetes.io/name: openab
    app.kubernetes.io/name: openab
      app.kubernetes.io/name: openab
        app.kubernetes.io/name: openab

User-supplied podLabels cannot override reserved selector labels. All 22 tests pass.

Copy link
Copy Markdown
Contributor

@masami-agent masami-agent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review — PR #380

All three requested items addressed. Verified the latest commits:

✅ Requested changes (resolved)

  1. sidecarsextraContainers — renamed in values.yaml, deployment.yaml, and tests ✓
  2. Rendered manifest posted — indentation correct, imagePullSecrets / initContainers / livenessProbe / extraVolumes / polymorphic env all render properly ✓
  3. Reserved key protection verifiedapp.kubernetes.io/name stays openab even when user sets hacked. Protection works ✓

✅ Bonus improvements

  • TOML injection fix — env values with double quotes now properly escaped via toJson
  • kindIs "slice" guard — list-typed env values rejected with clear error
  • imagePullSecrets dual format — supports both K8s native - name: regcred and shorthand - regcred
  • Reserved labels derived from selectorLabels — prevents future drift
  • Rebased on main (v0.7.7-beta.1)
  • 22 tests pass

Summary

High-quality contribution. The reserved-key protection, TOML injection fix, and polymorphic env support are all well-designed. Ready for maintainer approval.

LGTM.

@obrutjack
Copy link
Copy Markdown
Collaborator

Thorough work. Reviewed the full diff and the rendered manifest output.

What I like:

  • Reserved key protection derived from selectorLabels helper — prevents drift, smart design
  • TOML injection fix with toJson — caught a pre-existing risk
  • kindIs "slice" guard with clear fail message — good DX
  • imagePullSecrets dual format support — pragmatic
  • 22 tests passing, rendered manifest looks clean

One question before I approve: the configmap.yaml change filters out kindIs "map" env values from config.toml (since TOML can't represent valueFrom). What happens if an operator sets ALL env vars as valueFrom maps? Does config.toml still render a valid [agent] section with no env line, or does it produce env = {}?

thekkagent and others added 6 commits April 17, 2026 10:00
Add pod/deployment extensibility to the Helm chart, focused on
Deployment.spec.template scope: imagePullSecrets (global + per-agent
with opt-out), per-agent imagePullPolicy, serviceAccountName, probes,
lifecycle, initContainers, sidecars, extraVolumes, extraVolumeMounts,
podAnnotations and podLabels with reserved-key protection.

Also fixes Test 1-7 to pass botToken after upstream discord.enabled
gate change.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The env map previously only supported simple key-value strings. This
adds polymorphic rendering via kindIs "map" so users can use valueFrom
(fieldRef, secretKeyRef, configMapKeyRef, etc.) while maintaining full
backward compatibility with existing string values.

In deployment.yaml, map-typed values render as toYaml (e.g. valueFrom
blocks). In configmap.yaml, map-typed values are filtered out since
config.toml only supports string env vars — valueFrom is a Kubernetes
concept handled at the pod level.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Rename `sidecars` to `extraContainers` to avoid confusion with K8s 1.28+
native sidecar support — this field renders into the main containers
array, not as native sidecars.

Also adds a "Pod extensibility" comment header and documents that `{}`
means disabled for probes and lifecycle fields.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…imagePullSecrets format

- Fix pre-existing TOML injection risk where double quotes in env string
  values produced invalid TOML. Use toJson for proper escaping.
- Add kindIs "slice" guard in both deployment.yaml and configmap.yaml to
  reject list-typed env values with a clear error message.
- Derive agentPodLabels reserved keys from selectorLabels helper to
  prevent future drift between Deployment selector and pod labels.
- Support both K8s native object format ({ name: regcred }) and
  shorthand string format for imagePullSecrets.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace the deleted helm-template-test.sh with helm unittest YAML files:
- deployment_test.yaml: pod extensibility controls (18 tests)
- env_test.yaml: polymorphic env rendering + configmap filtering (5 tests)

Upstream already migrated to helm unittest (configmap_test.yaml,
adapter-enablement_test.yaml). Total: 36 tests across 5 suites.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… helper

Before: configmap used strict truthy check (`if ($cfg.discord).enabled`)
while deployment and secret used default-true (`ne toString "false"`).
This asymmetry came from upstream fix commits and caused different
behavior for edge cases across templates.

Introduces `openab.discordEnabled` helper following the same pattern
as `openab.agentEnabled` and `openab.persistenceEnabled`. Returns
"false" when discord config is absent or explicitly disabled, "true"
otherwise. All three templates (configmap, deployment, secret) now
use the helper for consistent behavior.

Also strengthens the reserved-label-hijack test to assert both
`app.kubernetes.io/name` and `app.kubernetes.io/component` are
protected.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@masami-agent masami-agent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review — PR #380

Previous review: APPROVED (after sidecarsextraContainers rename, rendered manifest verification, reserved key protection confirmation). This re-review covers the full 4-commit diff against current main.

Summary

  • Problem: Helm chart lacks common deployment controls (imagePullSecrets, probes, lifecycle, initContainers, extraContainers, extra volumes, serviceAccount, pod annotations/labels). Operators are forced to fork the chart for standard K8s deployment needs.
  • Approach: Phase 1 — pod/deployment extensibility only (Deployment.spec.template scope). Defers PDB, RBAC, and generic extra objects.
  • Risk level: Medium (chart surface area grows significantly, but all new fields are opt-in with safe defaults)

Core Assessment

  1. Problem clearly stated: ✅ (linked #397, thorough prior art research)
  2. Approach appropriate: ✅ (phased, focused on Deployment.spec.template only)
  3. Alternatives considered: ✅ (minimum-only rejected, everything-at-once rejected, middle path chosen)
  4. Best approach for now: ✅

Findings

Architecture — well-designed:

  • Reserved key protection via mergeOverwrite(dict, global, per-agent, reserved) with reserved merged last — prevents selector label hijacking and checksum/config override
  • agentPodLabels derives reserved keys from selectorLabels helper — prevents future drift
  • imagePullSecrets three-state design (hasKey check): omitted → inherit global, [] → opt out, [x] → replace
  • Both K8s native format ({name: regcred}) and shorthand string format supported
  • Polymorphic env: kindIs "map" branches between value: and toYaml (valueFrom), kindIs "slice" rejects lists with clear error
  • TOML injection fix: toJson escaping for env string values — catches a pre-existing risk on main
  • configmap.yaml correctly filters out map-typed env values (TOML can't represent valueFrom)

Tests — comprehensive (22 tests):

  • Tests 8-18: pod extensibility (imagePullSecrets, imagePullPolicy, initContainers, extraContainers, volumes, probes, lifecycle, serviceAccount, annotations, labels)
  • Tests 15-18: hardening (reserved label hijack, checksum hijack, imagePullSecrets opt-out, per-agent override)
  • Tests 19-21: env polymorphism (string, valueFrom, mixed)
  • Test harness bug fix: ((PASS++))PASS=$((PASS + 1)) for POSIX compatibility ✅
  • Tests 1-7 fixed to pass botToken after upstream discord.enabled gate change ✅

Rendered manifest verified (posted by contributor): indentation correct, imagePullSecrets/initContainers/livenessProbe/extraVolumes/podLabels all render properly.

Reserved key protection verified: app.kubernetes.io/name: hacked → still renders as openab

Review Summary

🔴 Blockers

  • Branch is 29 commits behind main — must rebase before merge.

💬 Questions

  • obrutjack's unanswered question: What happens when ALL env vars are valueFrom maps? I can answer this from reading the diff: $stringEnv will be empty, the {{- if $stringEnv }} guard skips the env = { } line entirely, and config.toml renders a valid [agent] section without an env field. OpenAB's Rust config uses #[serde(default)] for env, so this is correct — no env line = empty HashMap. @thekkagent please confirm.

🔧 Suggested Changes

  • None — all previous review items have been addressed.

ℹ️ Info

  • This is the largest PR in the review queue (+389/-10), but the scope is cohesive (all Deployment.spec.template).
  • The test file is growing (now 22 tests in one shell script). Contributor acknowledged this and plans to split by feature area in a follow-up PR.
  • values.yaml defaults are all safe: probes/lifecycle default to {} (disabled), initContainers/extraContainers/extraVolumes default to [] (empty), imagePullSecrets default to [] (none).

⚪ Nits

  • None

Verdict

APPROVE (pending rebase). Code quality is high, safety design is thoughtful, test coverage is thorough. The only blocker is the 29-commit rebase against main.

Copy link
Copy Markdown
Collaborator

@obrutjack obrutjack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thekkagent Following up on my question — I checked the configmap template and config.rs myself. When all env vars are valueFrom maps, $stringEnv is empty and the env = { ... } line is omitted entirely from config.toml. The env field in Rust has #[serde(default)], so missing env defaults to empty HashMap. No parse error. Correct behavior.

Splitting PDB/RBAC into a follow-up PR is the right call.

LGTM — approving.

@thekkagent thekkagent force-pushed the feat/helm-chart-extensibility branch from 0e70a04 to 8dad081 Compare April 17, 2026 08:22
@thekkagent
Copy link
Copy Markdown
Author

thekkagent commented Apr 17, 2026

⚠️ Scope expanded + rebased — fresh re-review needed

Heads up @masami-agent @obrutjack — your earlier approvals were on a smaller scope. This push expands the PR significantly. Happy to split into two PRs if the maintainers prefer, but the scope is coherent so I've kept it together.

1. Rebased on latest main

Resolves the 29-commit gap. Handled upstream's helm-template-test.sh → helm-unittest migration, discord.enabled default-true change, and Slack adapter additions without conflicts.

2. Test framework migration

Upstream removed helm-template-test.sh and switched to helm-unittest YAML (configmap_test.yaml, adapter-enablement_test.yaml + chart-test.yml GHA workflow). I followed suit:

  • tests/deployment_test.yaml — 17 tests (pod extensibility)
  • tests/env_test.yaml — 5 tests (polymorphic env + configmap filtering)
  • tests/serviceaccount_test.yaml — 8 tests (new)
  • tests/rbac_test.yaml — 9 tests (new)
  • tests/pdb_test.yaml — 6 tests (new)

Total: 61 tests pass across 8 suites.

3. Unified discord.enabled gate

Noticed an asymmetry after rebase — configmap.yaml used strict truthy check while deployment.yaml and secret.yaml used default-true. Introduced openab.discordEnabled helper (same pattern as agentEnabled, persistenceEnabled) applied to all three templates.

4. 🆕 Scope expansion: chart-managed ServiceAccount + RBAC + PDB

This is the meaningful scope change. Discord feedback from @tonylee (paraphrased: "my agents need to access AWS resources and the K8s cluster, so ServiceAccount + Role setup is required") made it clear that pod extensibility alone still forces users to fork for the most common production scenarios.

Given that the RBAC/PDB work follows the same per-agent multi-resource loop pattern and uses the same helpers (agentFullname, selectorLabels), bundling them was lower cognitive overhead than splitting. All new resources are opt-in with safe defaults (serviceAccount.create, rbac.create, rbac.createClusterRole, podDisruptionBudget.enabled all default to false).

Added templates:

  • serviceaccount.yaml — with annotations (IRSA-ready) and automountServiceAccountToken control
  • role.yaml + rolebinding.yaml — namespaced RBAC
  • clusterrole.yaml + clusterrolebinding.yaml — cluster-scope RBAC (for K8s API access across namespaces)
  • pdb.yaml — supports both minAvailable and maxUnavailable, with fail-loud validation

Backward compatibility: the existing agents.<x>.serviceAccountName field still works for referencing externally-created SAs. When serviceAccount.create: true is set, the chart-created SA takes precedence.

Known caveat: PDB with minAvailable: 1 combined with the hardcoded replicas: 1 will prevent node drains. Documented in values.yaml with a ⚠️ comment recommending maxUnavailable: 1 for drain support.

5. Re: @obrutjack's question on all-valueFrom env

Confirmed. When every env var is a valueFrom map, $stringEnv is empty, the {{- if $stringEnv }} guard skips the env = {} line entirely, and config.toml renders a valid [agent] section without an env field. Rust config uses #[serde(default)] for env, so missing = empty HashMap. No parse error.

6. Also added (minor)

  • /component assertion in the reserved-label-hijack test (previously only /name was asserted)

Commits

e657e86 feat(helm): add ServiceAccount, RBAC, and PodDisruptionBudget templates
8dad081 refactor(helm): unify discord.enabled check via openab.discordEnabled helper
9e1e5aa test(helm): migrate bash tests to helm unittest YAML format
f6ad77c fix(helm): escape TOML env values, reject list env vars, support K8s imagePullSecrets format
c0bd68f refactor(helm): rename sidecars to extraContainers, add value comments
58a0b73 feat(helm): support valueFrom and other complex env var types
deb71f1 feat(helm): add pod extensibility controls

Ready for re-review. If the scope is too much for one PR, I can split commit e657e86 out into a follow-up.

Phase 2 of helm chart extensibility (openabdev#397):

- ServiceAccount (optional, opt-in via serviceAccount.create)
  - Supports custom name, annotations (e.g. IRSA), and automountServiceAccountToken
  - Backward compat: existing serviceAccountName still works for external SAs
- Role + RoleBinding (optional, opt-in via rbac.create)
- ClusterRole + ClusterRoleBinding (optional, opt-in via rbac.createClusterRole)
- PodDisruptionBudget (optional, opt-in via podDisruptionBudget.enabled)
  - Supports both minAvailable and maxUnavailable (mutually exclusive)
  - values.yaml warns against minAvailable: 1 with the hardcoded replicas: 1

All resources are per-agent and follow the existing multi-agent loop pattern.
All defaults are opt-in (false) to preserve existing behavior.

25 new tests added across 3 suites (serviceaccount, rbac, pdb).
Total: 61 tests pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@chaodu-agent
Copy link
Copy Markdown
Collaborator

Hi @thekkagent — heads up on a direction change.

Per offline discussion among maintainers, the plan is to move the Helm charts out of this main repo and into a dedicated repository (e.g. openabdev/charts). The rationale is to decouple chart release cycles from the core application, which is a common pattern for projects with growing chart complexity.

This PR has solid work — the extensibility design, reserved-key protection, polymorphic env, RBAC/PDB templates, and test coverage are all well done. None of that effort is wasted.

However, since the charts are moving to a new repo, it would make more sense to land this PR there instead of updating the existing charts/ directory here. That way:

  1. The new repo starts with the full extensibility surface from day one
  2. We avoid a large merge into main that will be moved out shortly after
  3. Chart CI/CD (lint, unittest, release) can be set up independently in the new repo

We will share the new repo details once it is created. In the meantime, this PR can stay open for reference, or you can re-target it to the new repo when it is ready.

Thanks for the thorough work on this — looking forward to landing it in the right place.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add pod/deployment extensibility controls to Helm chart

5 participants